The ProDuSe Pipeline¶

The ProDuSe pipeline consists of multiple steps, each of which is described breifly here

Config¶

Sets up output directories, file symbolic links, and configuration files for each step in the ProDuSe pipeline

See also

Trim¶

Trims the adapter sequence off of each read, and prepends the adapter sequence to the read name. In addition, any reads which do not match the barcode sequence (and are outside the mismatch range) are discarded

Input: Paired fastq files

Output: Trimmed paired fastq files

Note

Trim can be used to demultiplex samples, assuming the barcodes corresponding to each sample are sufficiently distinct

See also

trim.py

bwa¶

Maps provided reads to a reference genome using the Burrows-Wheeler Aligner.

Input: Trimmed paired fastq files

Output: BAM file consisting of trimmed reads

See also

bwa.py

Collapse¶

Collapses duplicate reads into a consensus sequence.

Input: Trimmed BAM file

Output: Paired collapsed (consensus) fastq files

See also

collapse.py

bwa¶

Maps provided reads to a reference genome using the Burrows-Wheeler Aligner.

Input: Collapsed paired fastq files

Output: BAM file consisting of Collapsed reads

See also

bwa.py

Stitcher¶

Merges forward and reverse reads into a consensus sequence if they overlap. Used to correct errors in overlapping bases.

Input: Collapsed BAM file

Output: Stitched (forward and reverse reads are merged) BAM file

See also

Stitcher Documentation

SplitMerge¶

Splits reads merged by Stitcher back into forward and reverse reads. The shared bases are assigned to one of the two reads.

Input: Stitched BAM file

Output: BAM file with merged reads split. Unsorted

See also

SplitMerge.py

SNV¶

Identifies positions whereby one or more bases support an alternate allele.

Input: A BAM file containing de-stitched reads

Output: A VCF file listing all positions with non-reference bases, and the count of these bases

See also

snv.py

Filter¶

Filters candidate variant calls based upon overall capture space and locus characteristics.

Input: A VCF file listing candidate variants

Output: A VCF file listing filtered variants

See also

filter_produse.py